Genomics and phenomics of body mass index reveals a complex disease network

Elevated body mass index (BMI) is heritable and associated with many health conditions that impact morbidity and mortality. The study of the genetic association of BMI across a broad range of common disease conditions offers the opportunity to extend current knowledge regarding the breadth and depth of adiposity-related diseases. We identify 906 (364 novel) and 41 (6 novel) genome-wide significant loci for BMI among participants of European (N~1.1 million) and African (N~100,000) ancestry, respectively. Using a BMI genetic risk score including 2446 variants, 316 diagnoses are associated in the Million Veteran Program, with 96.5% showing increased risk. A co-morbidity network analysis reveals seven disease communities containing multiple interconnected diseases associated with BMI as well as extensive connections across communities. Mendelian randomization analysis confirms numerous phenotypes across a breadth of organ systems, including conditions of the circulatory (heart failure, ischemic heart disease, atrial fibrillation), genitourinary (chronic renal failure), respiratory (respiratory failure, asthma), musculoskeletal and dermatologic systems that are deeply interconnected within and across the disease communities. This work shows that the complex genetic architecture of BMI associates with a broad range of major health conditions, supporting the need for comprehensive approaches to prevent and treat obesity.


Study Populations
MVP Cohort: The design of the MVP has been previously described. 1 Briefly, individuals aged 19 to 104 years with the mean age of 62 years have been recruited from over 60 Veterans Health Administration medical centers nationwide since 2011. Each veteran's EHR is being integrated into the MVP biorepository, including inpatient International Classification of Diseases (ICD9/10) diagnosis codes, Current Procedural Terminology (CPT) procedure codes, clinical laboratory measurements, and reports of diagnostic imaging modalities. MVP has received ethical and study protocol approval by the VA Central Institutional Review Board in accordance with the principles outlined in the Declaration of Helsinki.
UK Biobank: UK Biobank is one of the largest and a detailed prospective study with over 500,000 participants aged 40-69 years recruited in 2006-2010. 2 The UK Biobank design including access to genetic data is well described in published reports 2 and in a publicly available website (see UK Biobank). The study has collected extensive phenotypic and genotypic data about its participants, including questionnaires, physical measures, sample assays, accelerometry, multimodal imaging, genome-wide genotyping and longitudinal follow-up for a wide range of health-related outcomes. We used UK Biobank only GWAS summary statistics from a recent UK Biobank GWAS for BMI that was included in a meta-analysis of the GIANT Consortium and UK Biobank BMI. 3 GIANT Consortium: The Genetic Investigation of Anthropometric Traits (GIANT) consortium is an international collaboration that seeks to identify genetic loci that modulate human body size and shape, including height and measures of obesity. We have obtained the summary statistics of the most recent BMI-GWAS results from the GIANT consortium as evidence for replication. 4,5 As previously described, 3 the GIANT GWAS meta-analysis for common variants consisted of a two-stage meta-analysis to identify BMI-associated loci in adults of European ancestry. Stage 1 of the meta-analysis was performed across 80 GWAS studies (n=234,069) and stage 2 used data from 34 additional GWAS studies (n=88,137) genotyped using Metabochip7. Fixed effects metaanalyses were conducted using the inverse variance-weighted method implemented in METAL. All contributing GWAS common SNPs were imputed using the HapMap phase II CEU reference panel for European-descent studies. Study-specific GWAS results as well as GWAS meta-analysis results were corrected for genomic control.
The AAAGC: To replicate the results from the MVP for African ancestry, we have obtained the summary results from the most recent and largest African ancestry-based BMI-GWAS published by the African Ancestry Anthropometry Genetics Consortium (AAAGC). 6 For the AAAGC analysis, 5 a three-stage design was used to evaluate genetic associations with BMI. Stage 1 included GWAS meta-analyses in AA individuals and stage 2 included replication of top associations from stage 1; stage 3 included metaanalysis of top associations from stages 1 and 2 AA studies. The discovery stage 1 of AAAGC used 17 GWAS studies of up to n=42,752 AA individuals for BMI analyses. Stage 2 replication for BMI was performed in an additional n=10,143 AA individuals from AAAGC followed by meta-analysis with EA individuals from the GIANT consortium (n=322,154 for BMI). Contributing SNPs were imputed using 1000 Genomes imputation.
Quality Control Analysis and Imputation DNA extracted from participants' blood was genotyped using a customized Affymetrix Axiom® biobank array, the MVP 1.0 Genotyping Array. The array was enriched for both common and rare genetic variants of clinical significance in different ancestral backgrounds. Quality-control procedures used to assign ancestry, remove low-quality samples and variants, and perform genotype imputation were previously described 7 and briefly summarized. We excluded: duplicate samples, samples with more heterozygosity than expected, an excess (>2.5%) of missing genotype calls, or discordance between genetically inferred sex and phenotypic gender. 7 In addition, one individual from each pair of related individuals (more than second degree relatedness as measured by the KING 8 software) were removed. Prior to imputation, variants that were poorly called or that deviated from their expected allele frequency based on reference data from the 1000 Genomes Project 9 were excluded. After pre-phasing using EAGLE 10 v2, genotypes from the 1000 Genomes Project 9 phase 3, version 5 reference panel were imputed into Million Veteran Program (MVP) participants via Minimac3 software 11 . Global and ancestryspecific principal component analysis (PCA) was performed using the flashPCA software. Principal component analysis plots of all MVP participants is provided in Supplementary Figure 1 to present the genetic heterogeneity of the MVP participants and relative homogeneity within European ancestry and African ancestry groups (see the definition in the next section). Following imputation, variant level quality control was performed using the EasyQC R package 12 (www.R-project.org), and exclusion metrics included: ancestry specific Hardy-Weinberg equilibrium 13 p-value <1x10 -20 , posterior call probability < 0.9, imputation quality <0.3, minor allele frequency (MAF) < 0.0003, call rate < 97.5% for common variants (MAF > 1%), and call rate < 99% for rare variants (MAF < 1%). Variants were also excluded if they deviated > 10% from their expected allele frequency based on reference data from the 1000 Genomes Project 9 .

Admixture analysis
We first extracted ancestry information from both self-reported and inferred sources. For self-reported ancestry, we extracted information on self-reported ethnicity and race from questions 4 and 5, respectively, from the MVP baseline survey administered to participants as a part of enrollment. For genetically-inferred ancestry, we ran the program ADMIXTURE 14 in the supervised mode using five sub-populations from the 1000 Genomes Phase 3 dataset 9 . The five reference populations are: CHB for Han Chinese in Beijing, GBR for British in England and Scotland, LWK for Luhya in Webuye, PEL Peruvians from Lima, YRI Yoruba in Ibadan, Nigeria. They represent East Asia, Europe, Eastern Africa, America, and Western Africa, respectively. We compared groups of individuals self-identifying as "White" or "White, non-Hispanic" to the fraction of their ancestry that aligned with the GBR reference populations. On top of the phenotypic definition, we retained samples with > 50% of their genome aligning with GBR for genetically European ancestry and samples with > 50% of their genome aligning with LWK or YRI for genetically African ancestry.

Functional Analysis of BMI-associated SNPs
We used FUMA (http://fuma.ctglab.nl/) to analyze the functional relevance of the novel loci identified from the combined meta-analysis as well as variants that are in high linkage disequilibrium with the lead SNPs (r 2 > 0.6, p<1× 10 -5 ). Genome-wide significant SNPs (p < 5 ×10 -8 ) were grouped into a genomic locus if they were not independent from each other at r2 > 0.1 or physically close (distance < 500kb). Lead SNPs were defined within each locus if they were independent (r 2 < 0.1) and genome-wide significant. For non-Hispanic European ancestry, over 1000 protein-coding genes were positional mapped by the novel loci, and the top mapped genes include UNC79, COX8C, UBN1, PPL, ABHD17A and PLEKHJ1. Most of the variants were annotated as intronic (42.9%) or intergenic (39.3%), while 1%, 0.46% and 1.3% were annotated as exonic, at 5' UTR and at 3' UTR, respectively. Further exploration of the functional consequences of mapped exonic variants has prioritized 75 genes with non-synonymous variations. We also performed eQTL mapping based on 48 tissue types from GTEx V7. 611 genes were mapped to eQTLs (FDR q<0.05) and the top tissues involved were tibial nerve, thyroid and skeletal muscle. Differentially expressed gene enrichment analyses have shown significant (Bonferroni corrected p<0.05) up-regulation for thyroid, pituitary, brain and prostate tissues, and down-regulation for heart, liver, adrenal gland, muscle and blood tissues. Pathway analyses curated from Gene Ontology have revealed top associations with neurogenesis, behavior and carbohydrate derivative metabolic process. For non-Hispanic African ancestry, the topped positional mapped genes for the 6 novel loci were RAI1, SREBF1, MOB1B and DCK. No exonic variants were identified. 12 genes (FDR q<0.05) were identified through eQTL mapping, thyroid, esophagus mucosa, skeletal muscle and tibial nerve tissues were mapped with the strongest associations. Differentially expressed gene enrichment analyses have shown significant (Bonferroni corrected p<0.05) down-regulation for pancreas and small intestine tissues. Top biological functions identified from the pathway analysis include regulation of cell polarity, regulation of protein targeting to mitochondrion, circadian rhythm and regulation of intracellular protein transport.

PheWAS Quality Control, Disease Definitions, and Association Analysis
Of 353,323 genotyped veterans, participants were included in the phenome-wide analysis if the electronic health record reflected 2 or more separate encounters in the VA Healthcare System in each of the two years prior to enrollment in MVP. We included 21,209,658 prevalent diagnosis codes in the PheWAS analysis. We focused on the European ancestry, in which the mean age was 63.95 ± 13.11 years, and 93.0% were male.
Diagnosis codes were collapsed to clinical disease groups and corresponding controls using the groupings proposed by Denny et al 15 . Diseases were required to have a prevalence of ≥ 200 cases and 200 controls to be included in the phenome-wide analysis. Each polygenic risk score (PRS) was tested using logistic regression adjusting for age, sex, and ten principal components using the PheWAS R package (https://github.com/PheWAS/PheWAS) in R v3.2.0 (www.R-project.org). In total, 1,244 and 833 disease phenotypes (phecodes) were available for analysis in 174,531 EA and 49,695 AA participants, respectively. We used the same set of phecodes included in the PheWAS 15 and therefore applied the same threshold for significance. We first ran linear regression with BMI as the dependent variable and GRSBMI as the independent variable to calculate genetically instrumented BMI, then logistic regression was conducted to estimate the association between genetically instrumented BMI and phecodes. Genetically instrumented BMI was standardized to the scale of raw BMI to compare the effect from the MR analysis with associations obtained from observational studies. Thus, all odds ratios (ORs) were scaled by per standard deviation (SD) of BMI. We controlled for age, sex and principle components in both stages. 16 Because the sample size was larger and the genetic instrument (i.e., GRSBMI) was stronger in the EA sample than in the AA sample, we present the phenome-wide MR results in EAs as our primary analysis.

Results for PheWAS and Network analysis
An additional 197 phecodes from 16 disease systems were associated with BMI but were not associated (p>0.05) with genetically influenced BMI (Supplementary Table 6). Of these BMI-associated phecodes, 60.4% (n=119) were noted to be inversely associated with BMI (Supplementary Table 6). The finding of a non-significant association in the GRSBMI-based MR is inconsistent with a significant increased risk or protective role of BMI but may suggest reverse causality. As there was no prior published evidence for many of the MR associations, many of our MR findings in individuals of European ancestry are novel.
We identified seven disease communities, which were comprised of diseases from multiple disease systems (e.g., Community A: circulatory, endocrine, nervous systems, genitourinary, and general symptoms). In addition, some communities are dominated by closed related clinical conditions. Community A (Supplementary Figure 7A) included 11 disease codes related to sensory organs, but mostly related to eye disorders such as cataract, glaucoma, inflammation of the eye, and other retinal disorders. Community C contains vascular diseases and skin diseases (Supplementary Figure 7C). Community D includes mostly heart disease codes such as ischemic heart disease, congestive heart failure and chest pain. Community G includes renal diseases such as renal failure, nephritis and proteinuria.
Morbid obesity, associated with an increased burden of comorbid conditions in our study, is well known to confer increased risk for morbidity and mortality in hospitalized patients, and obesity is prevalent in patients hospitalized with acute illness including COVID-19 patients, 17 in whom acute respiratory failure, cardiac failure and renal failure are common.
To evaluate the impact of bi-directional Mendelian randomization (MR) of traits reported in our MR PheWAS studies of BMI, we selected 10 traits across a broad range of disease areas for which GWAS summary data has been curated and centralized by the Medical Research Council Integrative Epidemiology Unit (MRC-IEU) open GWAS database (https://gwas.mrcieu.ac.uk). The list of the 10 traits is provided in the Table below. We used the BMI GWAS published by the GIANT consortium in 2015 (GWAS ID: ieu-a-2), which is widely used as a global reference for many exemplary analyses including MR. This GWAS does not use samples overlapping with UK Biobank so as to avoid potential bias in estimating causal effects. We used R package 'TwoSampleMR' to run the bidirectional MR. This package by default applied clumping to identify independent genetic variants for each individual GWAS summary data. Clumps are formed around index variants with P-value < 5e-8. Unsurprisingly, there is a strong bidirectional (twoway) "causal" effect between body mass index and type 2 diabetes. However, none of the other nine traits demonstrated evidence of a significant bidirectional effect to BMI, defined as inverse variance weighted (IVW) -based MR P-value < 0.005 (corrected for 10 selected traits). We recognize that a more systematic bi-directional MR analysis would be needed to evaluate a greater set of traits and our analysis is preliminary. For example, we did not include sensitivity analysis using Steiger filtering. 18 However, we believe our analysis provides an exploratory framework for assuming that significant bidirectional MR may be uncommon. We note that there is considerable inconsistency of available genetic instruments (i.e., genetic risk scores) for many conditions in PheWAS. A more comprehensive analysis is being pursed in other, future research projects, and would be beyond the scope of our current project.
Supplementary Table 1